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ABSTRACT 

We present a new technique for detecting structure on Mpc scales in the Lyct forest. 
The technique is easy to apply in practice since it does not involve absorption line 
fitting but is rather based on the statistics of the transmitted flux. It identihes and 
assesses the statistical significance of regions of over- or underdense Lya absorption 
and is fairly insensitive to the quality of the data. Using extensive simulations we 
demonstrate that the new method is significantly more sensitive to the detection of 
large-scale structure in the Lya forest than a traditional two-point correlation function 
analysis of htted absorption lines. 
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1 INTRODUCTION 


Over the past few years new information has emerged that warrants a new investigation into the large-scale clustering 
properties of the Lya forest seen in the spectra of distant QSOs. Observationally, major advances have been achieved with 
the help of the HST and the Keck telescope. At the lowest redshifts, where Lya absorbers are probed by HST, mnlti-slit 
spectroscopy of galaxies in the fields of bright QSOs has resulted in the direct identification of galaxies which produce Lya 
absorption, as evidenced by the anti-correlation between the Lya equivalent width and the distance of the absorbing galaxy 
from the QSO sight-line (Ghen et al. 1998; Lanzetta et al. 1995). The correlation appears to extend out to very large distances 
(Tripp, Lu, & Savage 1998), where the interpretation may be different and the Lya absorber presumably may not be directly 
associated with the ‘identified’ galaxy. Regardless of the interpretation on any scale, it now seems clear that the number 
density of Lya absorbers is larger in those regions of space where galaxies reside, and thus Lya absorbers trace large-scale 
structure. 

At high redshift, there is also mounting evidence that a significant fraction of Lya absorbers is associated with (proto-) 
galaxies. The discovery of ClV in 75 per cent of absorbers with A(Hl) = 3.0 x lO^'^ cm“^ at z ~ 3 (^ongaila fc Cowie 


199C) challenges the original interpretation of the Lya forest as a primordial, randomly distributed, intergalactic population. 


Furthermore, Fernandez-Soto et al. (199C) showed that Lya lines with associated weak CIV absorption cluster strongly in 
redshift and they concluded that the observed clustering is broadly consistent with that expected for galaxies at z ~ 2 — 3 


(but see also Songaila & Cowie 1996). 


The high redshift observations may be understood theoretically and placed within the context of cosmological structure 
formation with the help of numerical simulations. Using a uniform metal enrichment of the IGM of [C/H] ~ —2.5, produced 


by a postulated Population III burst of star formation, Hellsten et al. (1997) found that they could reproduce the observed 


mean value of the CIV/HI ratio with numerical simulations of cosmological structure formation. However, the scatter of this 


ratio implied an inhomogeneous metallicity distribution in the IGM. Gnedin (1998) subsequently suggested that the dominant 


mechanism for the enrichment of the IGM is the merger mechanism which reproduces both the mean and scatter of the CIV/HI 
ratio. 
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a discrete wavelet transform to perform a space-scale decomposition of the Lya forest and to demonstrate the existence and 
evolution of clusters on scales as large as 20 h~^ Mpc. Recently, Williger et al. (1998) reported correlations of Lya absorbers 
over ~ 36 h~^ comoving Mpc in the plane of the sky at 2.15 < 2 < 3.37. 

Fitting individual absorption lines and computing their two-point correlation function (tpcf) is the most commonly 
adopted approach to clustering analysis of the Lya forest. Pando & Fang (199(:) discussed this and other methods based 
on line statistics and concluded that a space-scale decomposition is most effective. However, the analysis by Fernandez-Sotc 
et al. 19961) demonstrates the difficulty of using any sort of analysis based on the statistics of fitted absorption lines. Even in 


high resolution spectra blending successfully masks even very strong clustering, so that any procedure involving identifying 
individual absorption lines may severely underestimate the strength and scale of the ‘true’ correlation. In addition, if the 
aforementioned numerical simulations are more or less correct then at least the low column density forest does not correspond 
to well-defined individual ‘clouds’ since it arises in a fluctuating but continuous medium with small to moderate overdensities. 

Ideally we therefore need a statistical method which does not rely on identifying individual lines, and which is free from 
any systematic effects associated with line counting. In this paper we introduce a new technique based on the statistical 
properties of the transmitted flux. The method is a space-scale decomposition and as such retains spatial information. It 
allows us to locate specific structures in the Lya forest, and assess their significance, as compared to a random distribution. 
The method is compared to a line counting/tpcf method, and we show that it is substantially more sensitive. 

The organisation of this paper is as follows: in section we describe the new analysis and carry out all necessary analytic 
calculations. In section ^ we use Monte-Carlo simulations to compare the new method with a tpcf analysis. We present our 
conclusions in section M. 


2 TECHNIQUE 


We base our analysis on the null-hypothesis that any Lya forest spectrum can be fairly well represented by a collection of 
individual absorption lines (Carswell et al. 1984; Kirkman & Tytler 1997; Lu et al. 1996; Hu et al. 1995) whose parameters are 


uncorrelated. Usually those lines are taken to be Voigt profiles and we shall adopt this although the exact shape of the profile is 
not relevant. We also need to adopt the functional form of the distribution of the absorption line parameters, 77 ( 2 , N, b), which 
we take from observations. We stress that we make no assumptions about what causes the absorption lines. Our analysis does 
not rely on identifying an absorption line with an individual, well-defined absorbing cloud. The composition of a spectrum of 
individual lines is purely descriptive. We simply use the null-hypothesis to predict integral properties of the absorption caused 
by the collection of lines. 

The general idea of the new analysis then is to use those predictions to identify over- and underdense regions of absorption 
as a function of scale and position (space-scale decomposition) and to assess their statistical significance. This is implemented 
by using a matched filter technique; in order to obtain an estimate of the mean transmission we simply convolve a normalised 
spectrum (of Np pixels) with a smoothing function of scale and repeat this process for all possible scales (cts = 1,..., Np). 
When plotted in the (A, (Ts) plane, this procedure results in the ‘transmission triangle’ of the spectrum. When using a top hat 
function as the smoothing function the base of the transmission triangle is the spectrum itself (the original spectrum smoothed 


by a top hat of width cts = 1 pixel) and the tip of the triangle is 1 — Da (Oke & Korycansky 1982) (the original spectrum 


smoothed by a top hat of width cts = Np pixels). Since we are only interested in local fluctuations of the transmission around 
the mean, we then subtract out the mean as calculated on the basis of our null-hypothesis. Essentially, this removes the global 
redshift evolution of the optical depth. The statistical significance of any remaining residual fluctuations around zero are then 
assessed in terms of the expected rms as a function of wavelength and scale. 

In the rest of t his section we calculat e the releva nt qu antities. The work p resented her e is developed from earlier calcula - 
tions carried out by Zuo fc Phinney (1995 ), Zuo (1993 ), and Zuo &: Bond (1994 ) (but see also Press, Rybicki, fc Schneider 1993 ). 
For completeness and clarity we reiterate some of their derivations here. When considering the expected mean transmission 
and its variance it is helpful to i ntroduce the concept of transmissi on probability. The idea is to view a Lya forest spectrum 
as a random stochastic process ( Press, Rybicki, fc Schneider 199^ ). Every point in the spectrum is a random variabl e, e~^, 
drawn from the transmission probability density function also known as flux decrement distribution function ( Rauch 


et al. 1997 ; Kim et al. 1997 ) or distribution of intensities ([jenkins fc Ostriker 1991 ; Webb et al. 1992 ). In principle, we have a 
different probability density function at each wavelength such that e.g. the moments of f\ are functions of wavelength. There 
is a small and subtle difference between the transmission probability density function and the distribution of pixel intensities 
of a spectrum, fx should in principle be measured by constructing the frequency distribution of pixel intensities at A (and 
only at A) of many different spectra. Although this is important to note we shall see later that at least the first and second 
moments of f\ are only slowly varying functions of A so that in many calculations we can approximate 6“’^ as a stationary 
stochastic process. 
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2.1 The mean Lya transmission 


Given the distribution of absorption iine parameters ~ ^)) what is the mean transmission at a given waveiength? 

We can define an effective opticai depth, Teg, as a function of observed waveiength, A, by 

g-rrff(A) _ 

In the foiiowing we wiii negiect any contribution to Teg from the ciassicai Gunn-Peterson effect which is iimited to tgp 
0.04 (Webb et ai. 1992). If the number of absorption iines per sight-iine is Poisson distributed with a mean of m = 
/o°° /o°° then we have 


e m)(e 


where p(fc; m) = e mh / k\ and 
Jo Jo Jzi 


dz dN db. 


m 


( 2 ) 


(3) 


Ts(Az; N,h) is the profiie of a singie absorption iine at z,N,b where A^ = A/(l + «). After some aigebra we find 

Teff = m(l-(e"^")) = j ■q{l-e~'^‘^^^'>)dzdNdb (4) 
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If we exclude strongly saturated and damped systems from our analysis then Tb{\z) peaks sharply at \z = Xa = 1215.67 A 
so that 


refr — 


1 + Zehs 

Afv 


r]{^abB,N,b) / (1 — e dX^ dN db, 


(5) 


where Zabs = X/Xa — 1. Usually, zi = Xp/Xail + 2em) — 1, where X/s = 1025.72 A, and in the absence of a proximity effect 
Z 2 = Zem- For A ciose to Ati(l + Zem), there are fewer than average absorption iines iongward of A. This produces an ‘edge 
effect’, superimposed on the weii-known proximity effect (Weymann, Garsweii, & Smith 1981; Cooke, Espey, & Carswell 1997). 
Similarly, there will be a reverse edge effect for A close to A/ 3 (l + Zem) because of the additional absorption by Ly/3 lines. If 
A falls well away from these limits then we can extend the upper and lower integration limits in (|^ to oo and 0 respectively 
because if Az and Xa are sufficiently far apart 1 — is zero. Thus we have 


/ T,{Zehs,N,b)W{N,b)dNdb. 


( 6 ) 


Observationally ry is found to be of the form 77 ( 2 , W, fe) = (1 + 2 )'’'F’(A', 6 ) (Kim et al. 1997; Lu et al. 1996; Bechtold 1991; 


Williger et al. 1994; Bahcall et al. 1993). We therefore arrive at 


A 


refr = B(l + 2abs)^+^ = B(—) 

F{N, b)W{N, b) dNdb. 


where 


^00 poo 


Aa 


In practice, we compute B directly from the data for reasons described in section |^. Thus we have 
(e-") 


(7) 

( 8 ) 

(9) 


2.2 The auto-covariance 

The auto-covariance function of the transmission is given by 

7e-.(A,A') = ^(e-"U)-(e-"("))) (e-"("')-(e-"("')))) 


) -e 


7’eff(^)ci 7‘0ff(A ) 


_ g-n{A.A') _ g-re«(A)g-reft(A')^ 


( 10 ) 


Following the same calculations as in the previous section, we find 
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n(A,A') = j viz,N, b){l-e~'"‘’^^'>e~^‘‘''^’^)dzdNdb. 

Let us consider the variance of the transmission given by 

o-e— = 7e—(A, A). 

Since Ts{N) oc N, we have 2Ta{N) = Ta(2N) and thus we get similarly to equation 

n(A,A) = + 


where 


B = 


1 

Ack 


ooo /»oo 


F{N,b)W{2N, b) dN db. 


( 11 ) 

( 12 ) 

(13) 

(14) 


Observati ons have shown that the distribution of column densi t ies can be fairly well rep r esented by a power law, F(N, b ) = 
N~^fib) ( Carswell et al. 1984 ) with (3 « 1.5 ( Kim et al. 1997 ; ^irkman fc Tytler 1997 ; Lu et al. 1996 ; Hu et al. 1995 ). A 
finite number of absorption lines per line of sight implies that the power law must break off at the low N end at some A^iow. 
We also expect a break at the high N end at some Ahi. Thus we have 


B = 


2/3-1 


r2JVhi 


FiN, b)WiN, b) dNdb. 


(15) 


'2Ni„ 


We know that the power law is a good approximation for the range 12 ^ logA^/cm ^ ^ 22 ( Hu et al. 1995 ; Petitjean et al, 
19930 so that Wow and Whi are in the linear and square-root regimes of the curve of growth respectively. Under this assumption 
it is straightforward to show that B can be well approximated by 2^~^B for /3 ^ 1.8, this being the exact result (for all /3) if 
there are no breaks in the power law. Therefore we finally arrive at 


2 _ -2'3-1s( 1+^)1+1 -2S(l+^)r+l 


(16) 


2.3 Instrumental effects 

So far we have not considered any instrumental effects. There are two classes of such effects: finite spectral resolution and 
various sources of noise. 


2.3.1 Finite resolution 

A new stochastic variable X is produced by convolving e~^ with a line spread function (LSF) L: 


/ OO 

e""<^')L(A-A')dA' 

-OO 

For the mean of X we get 

/ OO 

(e""^^'^)L(A- A')dA'. 

-OO 


(17) 


(18) 


Although it has been stressed that the mean and the variance of e are functions of A we will now approximate e ^ as a 


stationary stochastic process because both the mean and the variance are smooth, slowly varying functions of A. Lu & Zuo 


(1994) have shown this approximation to be valid. Thus we have 


(A)(A) ~ 

since the LSF is normalised to unity. As is intuitively clear, the convolution does not change the mean transmission. 
The auto-covariance function of X is given by 

7x(Ai,A 2) = ((A(Ai)-(A)(Ai)) (A(A2)-(A)(A2))) 


Z/(Ai — Al) L(A2 — A 2 ) 7e-’' (-^ 11 -^ 2 ) dXi d\2. 


(19) 


( 20 ) 


Since we consider e to be a stationary process, 7e-i- depends only on u' = X 2 — Ai and yx depends only on u = A 2 — Ai 
(Jenkins & Watts 1968). Usually, the LSF can be well approximated as a Gaussian. After some algebra we get 


. ^ ^ r r f . / 

'l-^(’^) ^ / 73 - , — / 7e —(m ) exp- —2 - du , 

vStTctI^SF J-oo \ J 


( 21 ) 


^LSF J- 

where (Tlsf = V^ahSF- 
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2.3.2 Noise 


The noise in optical spectra is mainly due to photon counting statistics, detector read-out noise, dark current, sky subtraction, 
and cosmic rays. As the Poisson statistics of the absorption lines are expected to dominate the variance we have not attempted 
to model the noise characteristics in great detail. We rather approximate the cumulative effect of all the noise components 
mentioned above to be Gaussian. Therefore, we define the stochastic variable Y by 


y(A) = X(A)+n(A(A)), 


( 22 ) 


where n is a random variable drawn from a Gaussian with mean zero and variance a^{X) = X (ci — C 2 ) -I- C 2 . The constants 
Cl and C2 characterise the photon counting statistics and the sky subtraction plus detector noise (ci > C2). For the mean of 
Y we have 

{Y) = (X) + (n) = (X) = (e-") (23) 

and the covariance is given by 

7 y (Ai, A 2 ) = 7 x:(Ai, A 2 ) + 'yxn{\i, A 2 ) + 7 xn(A 2 , Ai) -I- 7 n(Ai, A 2 ). (24) 

7xti denotes the cross-covariance function of X and n. Although X and n are not independent they are, by construction. 


uncorrelated, so that 'yxn = 0. Zuo & Bond (1994) showed that the originally uncorrelated photon noise in different wavelength 
bins remains uncorrelated after passing through a spectrograph of finite resolution. Therefore jn(u) must be discontinuous at 
u = 0: 


7n(u) = 


0 


u > 0 


f fx(x) cr^(x) dx u = 0 
where fxix) denotes the pdf of X. The integral reduces to (t^((X)). Thus 
= <^x + 

7y(u) = 7x(u) M > 0. 


(25) 


(26) 


2.4 Filter matching 

In order to develop a method for detecting structures of arbitrary scale, we proceed by convolving the spectrum with a 
smoothing function of smoothing scale CTs. The convolution filters out all power on scales smaller than Cs. By changing the 
width of the smoothing function we can match the filter width to the scale of any feature and thus maximise its signal. In 
practice, we perform the convolution successively at all possible smoothing scales. At the largest possible scale ((Ts.max = 
number of pixels in the spectrum) the entire spectrum is compressed into a single number whereas on the smallest scale 
(cTs.min = 1 pixel) the spectrum remains essentially unchanged. These two extremes correspond to the tip and the base of the 
triangle which forms when the successive convolutions of the spectrum are plotted in the (A, as) plane. In principle, there are 
many choices for the specific form of the smoothing function but for simplicity we will use a Gaussian, thus constructing a 
new stochastic variable G: 

wn2 


1 (\ — 

G{\,as) = ^^ y(A')exp(- ^ , ^ )dX'. 

Y2tv as 


y/2TV a. 
As in section 
{G)(A)~e 


2.3.1 


we have 


-Teft(A) ^ g-S(l 


(27) 


(28) 


Note that the use of a top hat function would yield a variable akin to 1 — Da (and the same result as equation (ES^), where 


Da is the flux deficit parameter (Oke & Korycansky 1982). The observations are consistent with this result (Press, Rybicki 
& Schneider 1993; Zuo & Lu 1993; but see also Bi & Davidsen 1997). Similar to equation (^) we find 


, [ 7y(w')exp(--^^ 

V27r aL 2o 


a2(e-"en) 

\/^ a's/ps 


-b ■ 


2aP 
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-)d'h 


+ a 


/2 

LSF 


7e— {u”) exp(- 


(n - u")^ 

2(cr'2 -b 


) du" 


(29) 


^a2 ■ 

where ps denotes the pixel size in A. To proceed we need to consider the auto-covariance function of a ‘perfec t’ spectrum, 
7 g-T, in more detail. In principle, it can be calculated from equation dill ) as was done by Zuo fc Bond (1994 ) for a single 
Doppler parameter rather than a distribution of h values. The result is a rather unwieldy numerical integral. Here we can take 
a different approach. As expected, we can see from equation diil ) that the quantity that we are interested in, a%. = 7g(0), 
does not depend on the exact shape of 7e-T but rather on the convolution of 7 e-T with a Gaussian. We may therefore hope to 
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be able to use a simpler analytic approximation for 7e-T since all systematic differences will be somewhat ‘washed out’ by the 
convolution. Ultimately, this procedure must be justihed by its success. We shall return to this point when we compare the 
results of this section to simulations. The most obvious (because simplest) approximation for 7 e-T is a Gaussian, especially 
when considering that unsaturated Voigt profiles are very nearly Gaussian: 

^2 

7e—( m) - CTe— e"^. (30) 


Since we are operating in wavelength space rather than in velocity space the width, q, must be a function of wavelength, 
because an absorption line with a given Doppler parameter will be wider in wavelength space at higher redshifts than at lower 
redshifts. This is of course just another reflection of the fact that e“'’^ is not a stationary process. But again, q will vary only 
slowly with wavelength (approximately linearly) so that the stationary approximation is valid. Using this approximation we 
find 


ac{\,as) 


al{e ^ 

2v^ as/ps 


Equations (^) and (|3l|) are the final result of this section. 


(31) 


3 SIMULATIONS 


The motivation for simulations of Lya forest spectra in this work is threefold. First of all we need to determine the parameters 
B and q. The normalisation B could be calculated numerically from equation (^. However, it is clear that for real data 
small inaccuracies in the zeroth and first order of the continuum fit will cause an artificial offset of the measured mean 
transmission from the calculated one. In anticipation of this problem we choose to determine B directly from the data. Since 
equation (^^ is only an approximation we cannot a priori calculate a precise value for q. We therefore have to measure it 
from simulations. Secondly, we would like to check the validity of equations (^^ and ( |3l[ ) by comparing the calculations with 
an analysis of simulated spectra. Thirdly, we would like to compare the sensitivity of the new analysis to the presence of 
non-random structures to that of the traditional line counting technique. In order to cater for this third need, we employed 
a more sophisticated method than simply randomly drawing the parameters of absorption lines from a given distribution 
rj{z, N, b). Instead we distribute absorbers in a cosmological volume and take lines of sight through that volume. This provides 
the flexibility of introducing specific types of clustering models. We assume absorbers to be spherical and prescribe a column 
density - impact parameter relationship of the form N{r) = No{r/ro)~°' which has been observed at low redshift where 
galaxies are unambiguously associated with Lya absorbers ( |Chen et al. 1998 ; Lanzetta, Webb, & Barcons 1996; Lanzetta 
et al. 1995; but see also Bowen, Blades, & Pettini 1996). This procedure simply ensures that the column density distribution 
of the absorption lines will be of the form N~^ with /? = 2la + 1. We draw Doppler parameters from a truncated Gaussian. 
We choose to keep the comoving number density of absorbers constant and thus ascribe their redshift evolution solely to the 
evolution of their absorption cross-section. This requires a redshift dependence of ro 


f ^ (1 + 2502)“ (1 + 2) 2 

ro( 2 ) = ro(2) 2- 

(1 + 50) “ (1 + 2) 2 

where we take the normalisation ro( 2 ) = 1 h~^ Mpc at No = 10^^ 


(32) 


from Lanzetta et al. (1995|) at z = 0.5. 


3.1 B and q from simulations 

In order to compare equations ( p8[ ) and ( [3l| ) with simulations we have produced a set of 1000 spectra in the manner described 
in the previous section with randomly distributed absorbers. The spectra are convolved with a line spread function and noise 
is added according to equation (^^. The parameters of the simulation are listed in Table ^ (SI). For each spectrum we 
constructed its transmission triangle using a Gaussian smoothing function. From this set of 1000 triangles we produced the 
mean and rms transmission triangles which are shown in Figures (|^ and (^. Before we can go on to compare these results 
with equations (^^ and ( 0 ) we must determine the values of the two parameters B and q. We fix the normalisation B at 
the tip of the mean transmission triangle by requiring 

{G)(as.n,ax,Ac) (33) 

where (Ts.max denotes the biggest possible smoothing scale and Ac is the central wavelength of the region under consideration. 
Having stipulated equation (BOl) we measure q (at Ac) from the simulations by performing a single parameter fit of the 
function 
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Figure 1. Mean transmission triangle produced from 1000 simulated spectra. 


Table 1. Parameters of simulations. 



7 

/3 

no 

(h^ Mpc-3) 

Mb 

(km s“^) 

^b 

(km s“^) 

^cut 

(km s“^) 

S/N 

FWHMlsf 

(A) 

SI 

2.5 

1.5 

0.01 

30 

8 

18 

20 

2 

S2 

2.5 

1.7 

0.01 

30 

8 

18 

20 

2 

S3 

2.7 

1.5 

0.01 

30 

8 

18 

20 

2 

S4 

2.5 

1.5 

0.015 

30 

8 

18 

20 

2 

S5 

2.5 

1.5 

0.01 

50 

8 

38 

20 

2 

S6 

2.5 

1.5 

0.01 

30 

16 

18 

20 

2 

S7 

2.5 

1.5 

0.01 

30 

8 

18 

20 

0.5 

S8 

2.5 

1.5 

0.01 

30 

8 

18 

5 

2 


no is the comoving number density of absorbers (normalisation of r){z, N, 6)), cr^, 
and 6cut are the mode, width, and lower cut-off of the Doppler parameter distribu¬ 
tion respectively. For models SI and S7 we created 1000 spectra, in all other cases 
we simulated 100 spectra. For all spectra (z) = 2.87. 




+ 1 


:exp( 


+1 


2(2'^?.qp+'3=)' 


M = 0 


u > 0 


7r(u) = 


(34) 




J. Liske et al. 


O 

O 

CM 



D 

O 

CO 

C7> 

c 


o 

o 

E 


(/) 


o 

in 


o 

o 


o 

in 


1-'-1-'-r 



4400 4600 4800 5000 


Wavelength (A) 



log((7g) 


Figure 2. Rms transmission triangle produced from 1000 simulated spectra. 


to the mean auto-covariance function of the 1000 simulated spectra. Since equation (^) (and hence equation (11)) is an 
approximation we do not a priori expect a statistically acceptable fit. Nevertheless, in practice this procedure provides a 
reliable estimate of q because both the shape (width) and normalisation of 'yy are sensitive to q. Figure shows the 
measured mean auto-covariance function of SI and its ht. The same is also plotted for two other sets of spectra (c.f. Table Q), 
S7 (same model as SI but the spectra are of higher resolution) and S5 (larger mode of the Doppler parameter distribution). 
It is evident that a Gaussian does not adequately represent the auto-covariance functions; a Gaussian has too much power 
on small scales and too little power on larger scales. Indeed, each fit produces an unacceptably large although we point 
out that in any case a somewhat larger than usual must be anticipated because of the non-Gaussian and correlated nature 
of the measurement errors of Jy- However, we recall that we are mostly interested in the typical width and strength of the 
correlation rather than its exact shape. Since both sets of spectra SI and S7 should yield the same value for q, the purpose 
of set S7 was to check whether the above method of determining q is robust and to provide an estimate of the true error in q 
as opposed to the formal error as calculated from the fit- As expected, q is of the order of the mode Doppler parameter, 
Pb, for a range of sensible values for pb, as seen from S5. In fact, q is seen to vary almost linearly with pb, which justihes 
q{X) = g(Ac)A/Ac. We have also investigated the behaviour of g as a function of the other model parameters and have found, 
as expected, that q is only sensitive to the parameters of the Doppler parameter distribution, pb and ab, and of the column 
density distribution, f3. It is insensitive to the redshift evolution, overall normalisation, and the quality of spectra since the q 
values measured from models S3, S4, S7, and S8 are all comparable. We conclude that the error in estimating q is dominated 
by the errors in pb, ab and [3. 
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Figure 3. Auto-covariance functions of different models as indicated and best fits (solid lines). For clarity, all errorbars have been 
exaggerated by a factor of 10. 


3.2 Comparison of analytical to numerical results 


With the values of B and q thus determined we can now directly compare the results from the simulations with equations 
( pil) and (^l|). Figures and (^a) show cross sections of the mean and rms transmission triangles of SI as functions of 
wavelength at smoothing scale FWHMg = 3.2 A. Figure (|^) shows a cross section through the rms transmission triangle as 
a function of smoothing scale at z = 2.87. The dashed lines show the calculations. Using the covariance matrix implied by 
equation (|34|), a test performed on the base of the mean transmission triangle yields P{> x^) = 0.12 and thus the model 
agrees very well with the simulations. For the rms the agreement is not quite as good. We find that for very large smoothing 
scales (FWHMa > 100 A) the model underestimates the rms by ~ 4 per cent. For smaller (and more relevant) scales the 
model fares progressively better. 

We have repeated this exercise for all sets of simulations listed in Table ^ and have always found the same good agreement. 
In addition, we have repeated the calculations in section (2.4) and the analysis of the simulated data for the case of a top hat 
smoothing function and these also agree very well. Thus we conclude that the errors in determining any fluctuations of the 
Lyu absorption around its expected mean and in estimating their significance will be dominated by the uncertainties in the 
assumed values of the parameters (3, fib, db and to lesser extent 7 and the overall normalisation. Any errors made in any of 
the approximations of the previous sections are small compared to these uncertainties. 


3.3 Sensitivity 

With all the calculations and parameter values in place we can now answer the questions: ‘How statistically significant is an 
enhancement of the local absorption line number density over the mean line number density at redshift 2 by a factor of 5n 
on the scale of x h~^ Mpc?’ and ‘At what redshift is an overdensity of Sn on the scale of x h~^ Mpc most significant?’ To 
address these questions we plot the quantity 

Q _ 0 "^eff 

dG 

in Figures (^) and (b) as a function of Sn and 2 respectively for a scale of 5 h~^ proper Mpc (FWHM of smoothing Gaussian), 
assuming the parameters of SI. From Figure (^a) we see that for a given redshift we can expect a maximum signal which 
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Wavelength (A) 


Figure 4. Cross section of the mean transmission triangle (c.f. Figure ^ at FWHMg = 3.2 A. The dashed line shows the prediction of 
equation (§)• 


cannot be exceeded. This is due to saturation as the number density of absorption lines increases rapidly towards higher 
redshift. Figure (^b) tells us that for a given level of overdensity there is an optimum redshift at which this level of overdensity 
will produce a maximum signal. 

At this point it is necessary to comment on the exact significance of, e.g., a ‘3a event’. For small smoothing scales the 
pdf of G is inherently non-Gaussian such that we expect the probability of G lying within 3a of the mean to be smaller than 
0.9973. In fact the pdf is skewed such that the probability of a +3 (j event (a void) is lower than the probability of a —3 ct 
event (a cluster). At larger smoothing scales the Central Limit Theorem guarantees Gaussianity. Thus a 3a event at large 
smoothing scales is statistically more significant than a similar event at small smoothing scales. This additional complication 
must be kept in mind. 


3.4 Comparison to TPCF 


Groups of QSOs that are closely spaced in the plane of the sky can be used to map out the large-scale 3-dimensional structure 
of the intervening absorbing gas by identifying absorption features that are approximately coincident in redshift space in two 
or more spectra. One of the advantages of the analysis presented here is that it can easily be applied to the spectra of such 
groups: the transmission triangles of the different spectra are simply averaged where they overlap. For sight-line separations 
of several arcminutes, different lines of sight will not intersect the same absorber, so that according to our null-hypothesis 
of an unclustered Lya forest different lines of sight are uncorrelated. Therefore the variance of a mean transmission triangle 
(averaged over multiple lines of sight) at (A,(Ta) is simply given by (Tq(A,(Ts) divided by the number of triangles overlapping 
at (A,(Js). Thus the signal of any structure extending across several lines of sight will be enhanced. 

In order to compare our analysis directly to a ‘traditional’ two-point correlation function analysis we have simulated 
spectra of a close group of QSOs where the absorbers are clustered. In view of the modern, large hydrodynamic simulations of 
structure formation which reproduce many of the observed properties of the Lya forest, the simulations presented here must 
be understood in the sense of a toy model. The advantage of our simulation is the flexibility to model different clustering 
characteristics, thus enabling us to test our method comprehensively. It is not important for these particular clustering models 
to describe reality accurately since our aim is to compare the relative sensitivity of a two-point correlation function analysis 
and the technique we have developed here. The validity of this test is unlikely to depend strongly on the type of clustering. 
We have explored two clustering scenarios: 


1) Absorb ers are clustered according to the gravitational quasi-equilibrium dist ribution (GQED) funct ion ( 3aslaw fc 
Hamilton 1984). We implement this scenario by following an approach first developed by Neyman fc Scott (1952|) and described 
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Wavelength (A) 

Figure 5. ('a') Cross section of the rms transmission triangle (c.f. Figure^ at FWHMs = 3.2 A. The dashed line shows the prediction 
of equation (pM. 


by Sheth & Saslaw (1994): we distribute clusters of absorbers randomly in a cosmological volume and draw the number of 
absorbers of a given cluster from the distribution (Saslaw 198!-) 


h{N) = 


-{l-b)b 


N-l^-Nb 


N = 0 
fV > 0. 


(35) 


b is the only parameter of the model and is defined as the ratio of potential and kinetic energies of the cluster (0 < 6 < 1). It 


is related to the two-point correlation function by (Saslaw & Hamilton 1984) 

W 27rGm^n 

where T and m are the temperature and mass of the cluster, n is the average number density and k and G have their usual 


(36) 


meanings. We choose b = 0.3 (Sheth & Saslaw (1994) estimate for galaxies bo ~ 0.75) and members of a cluster have a velocity 
dispersion of 500 km s“^. We assume clusters to be spherical and distribute absorbers within a cluster according to a King 


profile (King 1966). 

2) Absorbers form ‘walls’. Considering the connection of the Lytr forest with galaxies at low redshift and the repeated 
findings of independent groups that galaxies form sheet- and wall-like structures (Broadhurst et al. 199C; Ettori, Guzzo, & 
Tarenjhi 1997; Einasto et al. 1997; Connolly et al. 1997; Pi Nella et al. 1996|) it is conceivable that such structures may also 


be found in the Lya forest. In addition, at high redshift several hydrodynamic simulations have shown that the absorbing gas 


forms filaments, sheets and wall-like structures (Cen et al. 1994; 

Miralda-Escude et al. 1996; 

Cen & Simcoe 1997; 

Hernquisf 

et al. 

MiicE 

1996 

; Z 

hang, Anninos, & Norman 1995; 

Petitjean, Mucket, & Kates 1995; 

Mucket et al. 1996; 

Riediger, Petitjean, & 

!t 1998 

Wadsley & Bond 1997; 

Bond & Wadsley 1998 

), although these structures are of a smaller scale than we are 


interested in. In any case, we have included this model where walls of absorber overdensities extend across several lines of 
sight in order to demonstrate the better sensitivity of our analysis compared to a conventional cross-correlation analysis of 
fitted absorption lines. 

For both scenarios we have computed 100 sets of simulated spectra of a close group of four QSOs using the parameters 
of SI. 

Figure (^) shows the result of our new analysis for the case of GQED clustering. For all spectra we have computed 
their transmission triangles, subtracted the mean given by equation (^) and divided by the rms given by the square-root of 
equation (^I|). We shall refer to the result as ‘reduced’ transmission triangles. In the reduced triangles all residual fluctuations 
are given in terms of their statistical significance rather than their absolute magnitude. In panel (a) of Figure (Q) we plot 
the histogram of the minimum values (maximally significant overdense absorption) measured in these reduced transmission 
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Figure 5 — continued (b) Cross section of the rms transmission triangle (c.f. Figure g) at 2 ; = 2.87. The dashed line shows the prediction 
of equation (pM. 


triangles of the individual spectra. The distribution peaks at —3.6cr bnt in a significant fraction of cases (~ 40 per cent) we 
have a greater than Aa detection. Panels (b) and (c) show that these detections are not spurions but actually arise from the 
clusters. In panel (b) we plot the distribution of scales (FWHM of smoothing Gaussian) at which the minima of panel (a) 
are detected. Clearly we recover the correct velocity dispersion of the clusters. We loosely define the ‘strongest’ cluster in a 
spectrum as the cluster with the highest total column density and plot in panel (c) the histogram of differences in velocity 
space between the strongest clusters and the detected minima, A. Although there is clearly a peak at 0 km s“^ of the correct 
width, there are a large number of cases where the detected minima do not coincide with the strongest clusters. However, 
these mismatches do not all indicate spurious detections. Rather, they are mostly due to our definition of the strongest cluster, 
since it does not guarantee that the strongest cluster will produce the maximum absorption. 

We now compare the results above with a two-point correlation function (tpcf) analysis. We compute both ‘real’ and 
‘observed’ tpcfs from two separate lists of absorption lines. A ‘real’ list is derived from the input line list used to create the 
spectrum by simply applying an equivalent width detection threshold. To mimic blending due to instrumental resolution we 
generate an ‘observed’ line list from the input line list by blending all lines that lie within one FWHMlsf of each other into a 
single line and imposing an equivalent width detection limit. The position of the blended line is taken as the equivalent width 
weighted average of its components. We estimate the Sa equivalent width detection limit in our simulated data to be 0.26 A. 
The two-point correlation function is calculated as 


£.{Av) 


iVobs (An) 

fVexp(Au) 


(37) 


where N^hs and Aexp are the observed and expected number of pairs at separation Av. We account for the evolution of the 
mean line number density in the calculation of Aexp. The individual line correlation functions of a set of four spectra are 
averaged to increase the signal to noise ratio. 

In panel (a) of Figure we show the distribution of the maximally significant values detected in the averaged ‘observed’ 
(solid line) and ‘real’ (dotted line) two-point correlation functions. For an underlying clustered set of absorption lines, these 
distributions will be slightly sensitive to the bin size chosen in computing the tpcfs. To some extent this reflects one of the 
difficulties with the tpcf; one must chose a priori a bin size, without prior knowledge as to what an ‘optimal’ size might be. 
In practice, observers often chose the smallest convenient size which is larger than the instrumental resolution. We have done 
similarly in this experiment and have chosen 120 km s“^. 

The solid histogram in panel (a) peaks narrowly at l.Scr. Only 3 per cent of the detections are > 3a. Panel (b) shows the 
correlation scales at which the maxima are detected and we see that at least 50 per cent of the detections are spurious. The 
dotted histograms show the results for the ‘real’ tpcfs: significant detections (a) at the right scale (b). However, a comparison 
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Figure 6. (a) Expected signal in units of ctq of an overdensity of absorption lines of scale 5 h ^ proper Mpc (FWHM of smoothing 
Gaussian) at redshifts 2, 3, and 5. 



Figure 6 - continued (b) Same as (a) as a function of redshift for the indicated overdensities. 


with panel (a) of Figure shows that a tpcf analysis, even with infinite resolution (but finite S/N) and a perfect line fitting 
algorithm, does only marginally better in uncovering the presence of clustering than our new analysis using intermediate 
resolution. 

Figure (|^ shows the results for the case of a ‘wall’ of absorbers which is simulated by multiplying the redshift distribution 
of absorbers with a top hat function. The simulated wall is located at z = 2.78, it is 5 h~^ Mpc thick and is overdense by a 
factor of 5n = 2. As described above we have averaged the individual transmission triangles of each set of four spectra. The 
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Figure 7. Distributions of (a) values and (b) scales of minima detected in reduced transmission triangles of spectra with GQED 
clustering, (c) Distribution of differences between the positions of the ‘strongest’ clusters (see text) and the minima of (a). 


distributions of the values, positions and scales of the minima detected in the reduced averaged triangles are plotted in panels 
(a), (b) and (c) respectively. All detections are above the 3a level and from panel (b) we see that all detections are due to 
the wall. Taking the top hat shape of the wall into account, its thickness has correctly been recovered in panel (c). Using the 
peaks of the three distributions we calculate an overdensity of 2.6 (see also Figure ^). As in Figure (^) we plot in panel (d) 
the distribution of the maximum values detected in the averaged two-point correlation functions using the ‘observed’ (solid 
line) and the ‘real’ (dotted line) line lists. In addition, we performed a cross-correlation analysis and show the result as the 
dashed histogram. Both auto- and cross-correlations fail to deliver a significant result. In fact, even with infinite resolution 
and a perfect line fitting algorithm, the auto-tpcf analysis does a worse job of uncovering the ‘wall’ than our analysis using 
intermediate resolution. 

For both cases discussed above we have demonstrated that our new analysis is substantially more sensitive to the 
presence of non-random structure in the Lya forest than a traditional two-point correlation function analysis when applied to 
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Figure 8. Distributions of (a) values and (b) correlation scales of averaged two-point correlation function maxima using ‘observed’ line 
lists (solid lines) and ‘real’ line lists (dashed lines). 


intermediate resolution data. To further illustrate this point we show in Figure the same distributions as in Figures (^, 
(^) and (9) for the case where absorbers are randomly distributed. We note that the distributions of transmission minima in 
Figures (7) and (^ differ substantially from the one in Figure (^^, whereas the distributions of tpcf maxima are very similar. 
In panel (b) of Figure (0) we see the effect of the non-Gaussian statistics at small smoothing scales as discussed above; the 
minimum value in a transmission triangle is more likely to occur at small smoothing scales than at large ones which is why 
the minima are not evenly distributed over all scales as are the maxima of the tpcf. 


4 CONCLUSIONS 

In this paper we have developed a new technique to test for non-random structure in the Lya forest. This new technique 
does not require line fitting but is rather based on the statistics of the transmitted flux. We have tested the relevant analytic 
calculations and approximations against simulated data and have found excellent agreement. We have argued that the accuracy 
of our method is limited by the precision of the continuum fit and by the errors in the line distribution parameters rather 
than by errors introduced by analytic approximations. We have shown our new analysis to be substantially more sensitive to 
non-randomness in intermediate resolution data than a traditional two-point correlation function analysis. Finally, we have 
presented evidence that, in the case of a coherent structure of absorbers extending across several lines of sight, our analysis 
using intermediate resolution data is at least comparable, if not superior, in sensitivity to a tpcf analysis using high resolution 
data. 

The next step is to apply our method to real data. In a forthcoming paper we will present the results of our analysis of 
the spectra of a close group of ten QSOs. 
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Figure 9. Distributions of (a) values, (b) positions and (c) scales of minima detected in reduced averaged transmission triangles, where 
absorbers form a ‘wall’ at 4600 A. (d) Distribution of maxima of averaged auto- (solid line) and cross-correlation (dashed line) functions 
using ‘observed’ and ‘real’ (auto only, dotted line) line lists. 


Cen R., Simcoe R. A., 1997, ApJ, 483, 8 

Chen H.-W., Lanzetta K. M., Webb J. K., Barcons X., 1998, ApJ, 498, 77 
Connolly A. J., Szalay A. S., Ro mer A. K.. Nichol B... Holden B., Koo D., Miyaji T., 


1997, in Proceedings of the 18th Texas Symposium 


in Relativistic Astrophysics, |astro-ph/9702025 
Cooke A. J., Espey B., Carswell R. F., 1997, MNRAS, 284, 552 
Di Nella H., Couch W. J., Paturel G., Parker Q. A., 1996, MNRAS, 283, 367 
Einasto J. et ah, 1997, Nat, 385, 139 
Ettori S., Guzzo L., Tarenghi M., 1997, MNRAS, 285, 218 
Fernandez-Soto A., Lanzetta K. M., Barcons X., Carswell R. F., Webb J. K., 
Gnedin N. Y., 1998, MNRAS, 294, 407 

Hellsten U., Dave R., Hernquist L., Weinberg D. H., Katz N., 1997, ApJ, 487, 482 
Hernquist L., Katz N., Weinberg D. H., Miralda-Escude J., 1996, ApJ, 457, L51 


Yahil A., 1996, ApJ, 460, L85 










































































Structure in the Ly-a forest 17 



20 


10 


0 Ir-Hi- ■ I ■ ■ I V". ■ 

0 2 4 

Max(tpcf) (cj) 



Smoothing scale (km s~') 



Correlation scale (km s"‘) 


Figure 10. Distributions of (a) values and (b) scales of minima detected in reduced individual (solid lines) and averaged (dotted lines) 
transmission triangles, where absorbers are distributed randomly. The dotted histograms were renormalised. Distributions of (c) values 
and (d) correlation scales of maxima of averaged two-point correlation functions using ‘observed’ (solid lines) and true (dotted lines) line 
lists. 
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